Generating Confusion Sets for Context-Sensitive Error Correction
نویسندگان
چکیده
In this paper, we consider the problem of generating candidate corrections for the task of correcting errors in text. We focus on the task of correcting errors in preposition usage made by non-native English speakers, using discriminative classifiers. The standard approach to the problem assumes that the set of candidate corrections for a preposition consists of all preposition choices participating in the task. We determine likely preposition confusions using an annotated corpus of nonnative text and use this knowledge to produce smaller sets of candidates. We propose several methods of restricting candidate sets. These methods exclude candidate prepositions that are not observed as valid corrections in the annotated corpus and take into account the likelihood of each preposition confusion in the non-native text. We find that restricting candidates to those that are observed in the non-native data improves both the precision and the recall compared to the approach that views all prepositions as possible candidates. Furthermore, the approach that takes into account the likelihood of each preposition confusion is shown to be the most effective.
منابع مشابه
Performance Improvement of Dysarthric Speech Recognition Using Context-Dependent Pronunciation Variation Modeling Based on Kullback-Leibler Distance
In this paper, we propose context-dependent pronunciation variation modeling based on the Kullback-Leibler (KL) distance for improving the performance of dysarthric automatic speech recognition (ASR). To this end, we construct a triphone confusion matrix based on KL distances between triphone models, and build a weighted finite state transducer (WFST) from the triphone confusion matrix. Then, d...
متن کاملA Statistical Approach to Automatic OCR Error Correction in Context
This paper describes an automatic, context-sensitive, word-error correction system based on statistical language modeling (SLM) as applied to optical character recognition (OCR) postprocessing. The system exploits information from multiple sources, including letter n-grams, character confusion probabilities, and word-bigram probabilities. Letter n-grams are used to index the words in the lexico...
متن کاملImproving the Template Generation for Chinese Character Error Detection with Confusion Sets
In this paper, we propose a system that automatically generates templates for detecting Chinese character errors. We first collect the confusion sets for each high-frequency Chinese character. Error types include pronunciation-related errors and radical-related errors. With the help of the confusion sets, our system generates possible error patterns in context, which will be used as detection t...
متن کاملDysarthric Speech Recognition Based on Error-Correction in a Weighted Finite State Transducer Framework
In this paper, a dysarthric speech recognition error-correction method in a weighted finite state transducer (WFST) framework is proposed to improve the performance of dysarthric automatic speech recognition (ASR). To this end, pronunciation variation models are constructed from a context-dependent confusion matrix based on a weighted Kullback-Leibler (KL) distance between triphones. Then, a WF...
متن کاملEFL Learners’ Preferences for Error Correction and Its Relationship with Demotivation and Language Proficiency in the Iranian Context
The present study is an attempt to explore any significant relationships between learners’ preferences for error correction, demotivation, and language proficiency (LP). One hundred Iranian EFL students, including both males and females, studying at the departments of foreign languages of Shahid Bahonar University of Kerman and Tehran University took part in this study. In order to obtain the r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010